Chromosome X-wide association study in case control studies of pathologically confirmed Alzheimer’s disease in a European population

Although there are several genome-wide association studies available which highlight genetic variants associated with Alzheimer’s disease (AD), often the X chromosome is excluded from the analysis. We conducted an X-chromosome-wide association study (XWAS) in three independent studies with a pathologically confirmed phenotype (total 1970 cases and 1113 controls). The XWAS was performed in males and females separately, and these results were then meta-analysed. Four suggestively associated genes were identified which may be of potential interest for further study in AD, these are DDX53 (rs12006935, OR = 0.52, p = 6.9e-05), IL1RAPL1 (rs6628450, OR = 0.36, p = 4.2e-05; rs137983810, OR = 0.52, p = 0.0003), TBX22 (rs5913102, OR = 0.74, p = 0.0003) and SH3BGRL (rs186553004, OR = 0.35, p = 0.0005; rs113157993, OR = 0.52, p = 0.0003), which replicate across at least two studies. The SNP rs5913102 in TBX22 achieves chromosome-wide significance in meta-analysed data. DDX53 shows highest expression in astrocytes, IL1RAPL1 is most highly expressed in oligodendrocytes and neurons and SH3BGRL is most highly expressed in microglia. We have also identified SNPs in the NXF5 gene at chromosome-wide significance in females (rs5944989, OR = 0.62, p = 1.1e-05) but not in males (p = 0.83). The discovery of relevant AD associated genes on the X chromosome may identify AD risk differences and similarities based on sex and lead to the development of sex-stratified therapeutics.


INTRODUCTION
Alzheimer's Disease (AD) is the most common form of dementia accounting for between 60 and 80% of dementia cases [1].The prevalence of AD in women is higher than that in men [1,2].This may be because women tend to live longer than men [3,4].However, some studies have suggested that women over 80 years may be more likely to have AD than men of the same age [5].The effect of sex has a varying impact on AD over the course of the disease [6].The duration of ovarian hormone exposure protects against dementia [7], i.e., shorter oestrogen exposure in women was associated with higher dementia risk in the UK Biobank [8].A number of genetic loci have sex-specific effects on AD [6], for example, the risk of AD associated with a given APOE genotype changes with sex [9].The presentation of disease can differ by the sex of the individual with AD, highlighting the impact of sex on disease heterogeneity.Men and women demonstrate different cognitive and psychiatric symptoms.After diagnosis of either MCI or AD, women show faster cognitive decline.For those with mild cognitive impairment (MCI), brain atrophy is faster in women compared to men [10].Females have been shown to have a higher prevalence and severity of neuropsychiatric symptoms associated with AD, and males have been shown to have more severe apathy [11].
Although there are several genome-wide association studies (GWAS) available which highlight genetic variants associated with AD, often the X chromosome is excluded from the analysis.The X chromosome is often excluded because of analytical challenges caused by unique features such as transcriptional silencing of one allele in females, hemizygosity in males and recombination patterns [12].A PubMed search of chromosome X specific association analyses and AD from 2000 to present day identified only three relevant manuscripts.The first is a method to estimate risk on the X-chromosome but does not use X chromosome data directly [13].The second investigated the number of single nucleotide variants on the X chromosome and compared this number between AD cases and controls, highlighting two genes UBE2NL and ATXN3L [14].But only one study found have performed a chromosome X-wide association study (XWAS) for AD [15], but no associations reached genome-wide significance.A recent study investigating X chromosome gene expression found that several genes (including GRIA3, GPRASP2, and GRIPAP1) were associated with slower cognitive decline in women but not men.In contrast, X chromosome gene expression, like UBL4A, which encodes a protein folding factor and sorts proteins to the proteasome or to the endoplasmic reticulum, is associated with neuropathological tau burden in men but not women [16].
The aim of this study is to highlight single nucleotide polymorphisms (SNPs) and the most proximal genes to XWAS identified SNPs which are associated with AD risk on the X chromosome.To investigate potential sex differences in AD highlighted by genetics, we performed an XWAS (a genomewide like association study focused on the X-chromosome) of AD risk by meta-analysing results from sex-stratified analyses.We conducted this in three collections of samples; (i) KRONOS/ Tgen data which contains 994 AD cases and 572 controls, (ii) Brains for Dementia Research (BDR) data which contains 356 AD cases and 164 controls, and (iii) Religious Orders Study/Memory and Aging Project (ROSMAP) + Mount Sinai Brain Bank (MSBB) + Mayo Clinic Brain Bank (MAYO) data which contains 702 AD cases and 486 controls.The AD diagnosis in these data was pathologically confirmed.We used these studies to determine AD associated SNPs which replicate in at least two studies.We then meta-analysed the results with the view to increase power.In addition, the XWAS results for each study were used to perform a gene-based analysis to better identify associated genes with AD risk accounting for multiple, independent associations in the gene.

Data description
The KRONOS/Tgen dataset is obtained from 21 National Alzheimer's Coordinating Center (NACC) brain banks and from the Miami Brain Bank as previously described [17][18][19][20].The criteria for inclusion were: self-defined ethnicity of European descent, neuropathologically confirmed AD or no neuropathology present, and age of death greater than 65.Neuropathological diagnosis was defined by board-certified neuropathologists according to the standard NACC protocols [21].Samples derived from subjects with a clinical history of stroke, cerebrovascular disease, Lewy body dementia, or comorbidity with any other known neurological disease were excluded.AD or control neuropathology was confirmed by plaque and tangle assessment with 45% of the entire series undergoing Braak staging [22].The cohort consists of 912 AD cases and 454 controls.Samples were de-identified and the study met human studies institutional review board and HIPPA regulations.This work is declared not humansubjects research and is IRB exempt under regulation 45 CFR 46.This data was imputed using the Michigan Imputation Server [23] using the TOPMed panel [24], and SNPs with an INFO score less than 0.7 were removed.
The Brains for Dementia Research (BDR, brainsfordementiaresearch.org.uk)data [25,26] is a longitudinal cohort of dementia samples and controls.Currently there are approximately 1200 DNA samples from brain tissue or blood and this is expected to increase to 3200 samples with genetic information.BDR is a world-class brain tissue resource supported by the Alzheimer's Society and Alzheimer's Research UK establishing a network of brain banks in England and Wales.In addition, other data is collected related to cognition, general health, and lifestyle every 1-5 years.BDR data was imputed on the Michigan Imputation Server using Minimac4 pipeline and the TOPMed reference panel [24] is available through the Dementias Platform UK (DPUK, https://portal.dementiasplatform.uk).The genotyped cohort includes 354 cases confirmed with AD as the primary dementia (age at onset >65 years) and 163 cognitively normal controls without additional neuropathology; all diagnoses were neuropathologically confirmed [15].
We harmonised the Religious Orders Study/Memory and Aging Project [27] (ROSMAP), Mount Sinai Brain Bank (MSBB) and Mayo Clinic Brain Bank (MAYO) whole-genome sequenced data into one cohort which we then analysed together.Datasets were downloaded from the AMP-AD portal via the Synapse platform and https://www.radc.rush.edu.ROSMAP is a longitudinal study investigating AD and ageing [28,29], MSBB has gene expression, genetic variant, neuropathological and proteomic data for brain specimens and MAYO is a cohort containing genetic, neuropathological, biochemistry and cell biology data.Quality control analysis was carried out in the combined data, as described in [30].AD cases are defined using a subject's clinical definition for AD and Braak score of 5 or 6 and controls are defined as those without a clinical AD diagnosis and Braak score less than or equal to 4. This sample contains 1188 individuals: 702 AD cases and 486 controls.
The demographic data for all cohorts is seen in Table 1.The age in cases and controls are comparable between KRONOS/Tgen, BDR and ROSMAP/ MAYO/MSBB.

Quality control
In all data sets, the chromosome X data was QC'ed in males and females separately.The amount of missingness in individuals and SNPs were checked, but no missingness was found.SNPs out of Hardy-Weinberg Equilibrium (p < 1e-6), tested in females [31], were removed and SNPs with minor allele frequency (MAF) < 1% were also removed.In KRONOS/Tgen 225,873 and 227,323 SNPs, in BDR 228,716 and 230,052 SNPs and in ROSMAP/MAYO/MSBB 167,018 and 169,794 SNPs were retained, in males and females respectively.SNPs in all datasets are genome build 38.

Chromosome X-wide Association Studies (XWAS)
The XWAS was carried out in males and females separately in Plink v1.9 [32,33] using option --xchr-model 2. The models were adjusted for age and principal components (PCs); 5 PCs were used for KRONOS/Tgen and 10 PCs were used in BDR and ROSMAP/MAYO/MSBB.The number of PCs necessary for adjustment was determined from visual inspection of PC plots.
The XWAS from males and females were meta-analysed together using GWAMA [34] which also reports differentiation and heterogeneity between results in males and females.These results were represented using a Manhattan plot using the manhattan() function in R [35].An FDR multiple testing correction was applied to identify significantly associated SNPs.

Meta-analysis of AD XWAS
The XWAS were meta-analysed together using METAL [36], results in males and females were meta-analysed separately and then GWAMA was used to join results in males and females.These results were represented using a Manhattan plot using the manhattan() function in R [35].An FDR multiple testing correction was applied to identify significantly associated SNPs.

Gene-based analysis
A gene-based analysis of the XWAS was carried out in MAGMA v.1.08[37], SNPs were assigned to genes based on gene locations from the NCBI site using a window of 35 kb upstream and 10 kb downstream and the original data was used to estimate linkage disequilibrium (LD) between SNPs.The mean-chi2 approach was used, which averages the effect of SNPs in the gene.The KRONOS/Tgen and BDR data annotates to 800 genes and ROSMAP/MAYO/MSBB annotates to 665 genes.An FDR multiple testing correction was applied to identify significantly associated genes.

Using expression data to gain insights into genes of interest
To gain insights into how these chromosome X putative risk genes may contribute to AD, we searched a series of publicly open datasets, including our own, containing expression data for these genes from bulk and singlecell RNA-seq datasets from human and mouse [20,[38][39][40][41][42][43][44][45][46][47][48][49].We have also examined RNA samples of AD mouse models [50][51][52].STRING database (string-db.org) was used to assess Protein-Protein Interaction Networks for the identified putative genes.Ingenuity analysis (digitalinsights.qiagen.com) was performed with candidate genes across all mammalian species for tissues and cell types curated in Ingenuity.

XWAS results
The XWAS results in the KRONOS/Tgen data, with males and females meta-analysed together are presented in Supplementary Fig. 1.There are no SNPs with association above the chromosomewide significant threshold, but there are several peaks which reach suggestive significance (p < 1.1e-03).The GWAMA software provides a p value for a sex differentiated effect.Of the top SNPs presented in Supplementary Table  The Manhattan plot of the BDR data with males and females combined is seen in Supplementary Fig. 2. Similarly, to the KRONOS/Tgen data, no SNPs reach chromosome-wide significance but a number reach suggestive significance (1.1e-03).The top SNPs from these peaks can be seen in Supplementary Table  The ROSMAP/MAYO/MSBB XWAS results in males and females combined is seen in Supplementary Fig. 3.There are several suggestive SNPs (1.4e-03) but none are chromosome-wide significant, the top SNPs from these peaks are seen in Supplementary Table

Meta-analysis
Since there was apparent overlap between results, we metaanalysed all three cohorts together.The meta-analysis produced results for 264,793 SNPs.The results can be seen in Fig. 1, two peaks can be observed which are chromosome-wide significant based on an FDR correction of chromosome X SNPs.Table 2 shows all SNPs which were highlighted by a single cohort which also replicate in the meta-analysis; the effect sizes and p values in each single cohort and also from the meta-analysis are presented.The Manhattan plots in males and females separately are seen in Supplementary Figs. 4 and 5 respectively.There is a peak in females which reaches chromosome-wide significance based on an FDR correction which is not seen in males, these SNPs map to gene NXF5.
The results across all cohorts are summarised in Fig. 2, this highlights that evidence is strongest for 4 genes; DDX53, IL1RAPL1, TBX22 and SH3BGRL, which show replication in at least two independent cohorts.So, although individual SNP association p values are only suggestive, we can be more confident in these findings as they replicate across multiple studies.In addition, in the meta-analysed data SNP rs5913102 in TBX22 reaches chromosome-wide significance based on an FDR correction.The LocusZoom [53] plots for these four genes in the meta-analysed (KRONOS/Tgen + BDR + ROSMAP/MAYO/MSBB) data is seen in Supplementary Fig. 6.
SNPs in Table 2 were uploaded to RegulomeDB [54], SNPs rs5913102 and rs12848641 had rank "1f" indicating that these SNPs are likely to be located in a functional region and affect transcription factor binding.All other SNPs had rank > 4 suggesting minimal binding evidence.We also computed a combined annotation dependent depletion (CADD) score [55] for each of these variants which integrates several diverse annotations, but no SNPs had a score > 20 suggesting that variants may not be functional.
We have also found SNPs significantly associated with AD in the genes which have been identified in [16], which also used ROSMAP but gene expression data.We replicated signals in three genes in the meta-analysis of all three cohorts, namely GRIA3 (rs6649016, OR The SNPs presented in Table 2 have effect sizes in the same direction in males and females, see Supplementary Table 4.For the SNPs replicating across multiple studies we also investigated the impact of APOE status on these associations, by adjusting for Fig. 1 Manhattan Plot of KRONOS/Tgen + BDR + ROSMAP/MAYO/MSBB XWAS.Each dot represents a SNP, the x-axis is the SNPs base position and the y-axis is the p-value (-log10(p)), the red line shows the chromosome X wide significance threshold and the blue line shows the suggestive significance threshold.
both number of APOEe4 alleles and the interaction between the SNP and number of APOE e4 alleles, the p value for this model is presented in Supplementary Table 4.In general, the p-value adjusted for APOE status has only changed slightly, with the largest change for the SH3BGRL gene (p = 5.2e-5, p adj = 0.014).We have identified SNPs in NXF5 gene to be chromosome-wide significant in females but not in males (rs5944989 MAF = 0.60 and 0.58, OR = 0.62 and 0.98, p = 1.1e-05 and 0.830, in females and males, respectively, Ref/Alt Allele=A/G), see Table 3.
When meta-analysing males and females using GWAMA, a sex heterogeneity p-value is computed; the top SNPs with the smallest p-values for heterogeneity are also seen in Table 3.These SNPs show an opposite effect direction in males and females.

Gene-based analysis
The gene-based analysis in the KRONOS/Tgen and ROSMAP/ MAYO/MSBB data does not provide any genes which surpass the gene-wide threshold (based upon the total number of Fig. 2 Summary of findings across all cohorts.The four panels represent each of the four genes (grey )highlighted in this study, the relevant identified SNPs (blue/green) for each genes, and the effect sizes and p-values in the different cohorts (red/orange/yellow) and the metaanalysis of all cohorts (pink).

IL1RAPL1
Numbers in bold are effects with a p value less than 0.05.
E. Simmonds et al.
chromosome X genes analysed).In the KRONOS/Tgen the third most significant gene is SLC25A5 (p = 0.0048, p fdr = 0.74) which was also highlighted as a proximal gene to a SNP identified in the XWAS.In the BDR data, there were two genes which reached gene-wide significance, these were BGN and HAUS7 (p fdr = 0.004 for both genes).HAUS7 was also identified as chromosome-wide significant from the XWAS in the metaanalysed data.A gene-based analysis from the meta-analysis also did not produce any gene-wide significant results.

Using expression data to gain insights into genes of interest
We investigated the meta-analysis XWAS significant genes (Table 2) for their relevance to neurodegeneration by searching several public datasets containing expression data from bulk and single-cell RNA-seq datasets from human and mouse samples to assess conservation of gene responses between species.We saw DDX53 was expressed at low levels in several cell types across the human brain but showed highest expression in astrocytes (Supplementary Fig. 7A).In some human AD studies DDX53 showed increased expression in AD compared to age-matched controls (Supplementary Fig. 7B).IL1RAPL1 was expressed at highest levels in human oligodendrocytes and neurons (Supplementary Fig. 8A) and showed decreased expression in mouse models of AD compared to age-matched wild-type mice (Supplementary Fig. 8C).SH3BGRL was expressed at the highest levels in human microglia (Supplementary Fig. 9A), and showed decreased expression in AD compared to age-matched controls in one data set (Supplementary Fig. 9B), and increased expression in another (Supplementary Fig. 9C).In mouse models of AD, SH3BGRL showed increased expression compared to age-matched controls (Supplementary Fig. 9D, E).Data was not available for TBX22, and so was likely lowly expressed in the datasets examined.We saw that these genes of interest could be linked to known familial neurodegenerative disease genes (APP and HTT), and AD risk genes (APOE and PLCG2) from prior experimental studies using the Ingenuity database (Fig. 3).BTK is an additional gene which was highlighted in only one study but show interesting expression results.BTK was expressed at the highest levels in human microglia (Supplementary Fig. 10A).BTK showed increased expression in AD compared to agematched controls (Supplementary Fig. 10B), and increased expression in mouse models of AD (Supplementary Fig. 10E).In the STRING database, the BTK-based network includes PLCG2, SYK, TLR4, TGFB1 and TREM2 genes which belong to microglial pathways important in AD (Supplementary Fig. 11).Thus, the new chromosome X genes we have identified are likely to contribute to AD by modifying existing pathways that are known to control AD risk.

DISCUSSION
This study, which uses a pathologically confirmed diagnosis of AD, identifies four potential genes, DDX53, IL1RAPL1, TBX22 and SH3BGRL, associated with AD, which replicate across at least two of the sub-studies.
TBX22 has previously been shown to be associated with cleft lip and cleft palate [56].The SNP in TBX22 has a RegulomeDB [54] rank of "1f" indicating the SNPs likelihood of being in a functional region and affects transcription factor binding.
SH3BGRL has been linked to Parkinson's disease, where higher expression is shown in cases compared to controls [57] and is highly expressed in breast cancers [58].In addition, a proteome analysis in AD has identified increased SH3 protein in the brain [59].
DDX53 is an intronless gene which is linked to Autism Spectrum Disorder (ASD), although the DDX53 mutations were shown to have no effect on synaptic transmission [60].
IL1RAPL1 is a synaptic adhesion molecule located at the postsynaptic membrane, it regulates dendrite formation and impacts activity of IL-1β on dendrite morphology [61].Literature suggests that there are other genes in the IL1 family that have some relevance to neurodegenerative or brain disorders.IL1RAP is highly expressed in the brain [62] and SNPs located in IL1RAP were found to be associated with longitudinal change in brain amyloid [63].There were also associations found between the most significant SNP in IL1RAP and progression from MCI to AD, cognitive decline, temporal cortex atrophy and microglial activity [63].rs1921622, a variant in IL1RL1 has been shown to lower the risk effects of APOE-ε4 in female AD patients by lowering soluble ST2 [64].
Several genes were identified as being associated with sexdifferentiated effects.NXF5 was found to be chromosome-wide significant in females but not in males.NXF5 has been linked to intellectual disability [65] and is known to be involved in brain development [66].SNPs in SPIN4, LOC105373237, ZC3H12B and LOC105373347 genes were identified as the most significant sex heterogeneous SNPs in the XWAS meta-analysis.The SPIN4 gene inhibits cell proliferation, binds specific histone modifications and negatively regulates body growth [67].ZC3H12B has been identified as being associated with AD using a Bayesian genome-wide transcriptome-wide association study [68].
Three genes identified in a previous study [16] were replicated from the meta-analysis XWAS of all three cohorts; these genes were GRIA3, GRIPAP1 and UBL4A.As well as being associated with slower cognitive decline in women but not men, GRIA3 is known to be involved in memory and learning and is highly correlated to HLA-DRB5, which is associated with AD [69].
Two additional gene-wide significant genes were identified in the BDR data in the gene-based analysis; these genes are BGN and HAUS7.BGN has been associated with amyloid metabolism in AD [70], inflammatory state in obesity and type 2 diabetes [71] and is known to be a central gene in a network in the brain in response to fructose consumption [72].HAUS7 is necessary for cytokinesis and regulates mitotic spindle and centrosome integrity (https:// www.ncbi.nlm.nih.gov/).SNPs in HAUS7 also reach chromosomewide significance after FDR multiple testing correction in the meta-analysis of all three cohorts.
BTK was found to be strongly connected to PLCG2 [73].PLCG2 activation leads to the B cell receptor (BCR) signalling and BTK is in the BCR signalling complex.In the protein network, Toll Like Receptor 4 (TLR4) connects BGN to the BTK network [73,74] including the PLCG2 and epigenetic silencing of the immunosuppressive response.PLCG2 is a well validated AD risk gene [75,76].The BTK-BGN network includes strong microglial genes (PLCG2, SYK, TLR4, TGFB1) and TREM2 links strongly to these [76].DDX53 shows highest expression in astrocytes, IL1RAPL1 is most highly expressed in oligodendrocytes and neurons and SH3BGRL is most highly expressed in microglia.Collectively, the expression data suggests that the putative chromosome X risk genes act through different cell types and pathways to modulate risk for AD, with some genes increasing risk and some being protective.
The strength of this study is the utilisation of available chromosome X data in the KRONOS/Tgen, BDR and ROSMAP/ MAYO/MSBB data in relation to AD, which until now has been understudied.It also uses three independent cohorts with a pathologically confirmed phenotype to investigate potential replication and increase power by meta-analysing independent XWAS together.
The limitation of this study is the relatively small sample size of the available studies; however, we have attempted to improve power by meta-analysing these cohorts together.Despite the small sample sizes, we report consistent results across studies which reinforce our findings.
In conclusion, this study has highlighted several potential target genes on chromosome X associated with AD risk which may be relevant for further study, with the end goal of identifying differences in AD progression between males and females and potentially developing sex-stratified therapeutics.

DATA AVAILABILITY
The XWAS summary statistics for the meta-analysis of KRONOS/Tgen and BDR is available at the DRI GitHub repository (https://github.com/UKDRI/XWAS_AD_summary_stats).The Manhattan plot for this meta-analysis is presented in Supplementary Fig.   3 The Ingenuity Pathway Analysis (IPA) explores candidate genes across various mammalian species within tissues and cell types catalogued in Ingenuity.Input genes are denoted by gene symbols encircled with grey filled nodes, while solid lines signify direct interactions, such as protein-protein interactions or phosphorylation events. 1 , rs5910591 has a differential effect between males and females (p = 3.2e-05), the effect is driven by females (OR = 0.48, p = 2.4e-05, Ref/Alt Allele=G/A, MAF = 0.20), and there is no significant effect in males (0.82, p = 0.10, Ref/Alt Allele= G/A, MAF = 0.21). 2 . The top SNPs from BDR are different to those in KRONOS/Tgen but rs186553004 and rs5913102 both replicate (OR = 0.39, p = 0.031, Ref/Alt Allele=G/C, MAF = 0.02; OR = 0.69, p = 0.009, Ref/Alt Allele=C/T, MAF = 0.18 respectively).

Fig.
Fig.3The Ingenuity Pathway Analysis (IPA) explores candidate genes across various mammalian species within tissues and cell types catalogued in Ingenuity.Input genes are denoted by gene symbols encircled with grey filled nodes, while solid lines signify direct interactions, such as protein-protein interactions or phosphorylation events.

Table 1 .
Demographic summary for all cohorts.